Assessing the Risk of Crime in London Borough of Barnet

Hannah Chang Mentored by Martine Wauben

Contents

  1. Introduction

  2. Exploratory Data Analysis

  3. Spatial Autocorrelation

  4. Principal Component Analysis

  5. Multivariate Kriging Model

  6. Classic Machine Learning Model

  7. Conclusion

  8. Sources

Introduction

  • Can identifying environmental factors help inform the risk of crime at population level?

    • Risk Terrain Modelling by Metropolitan Police
  • Goal: to build models to predict crime based on places information

  • Data

    • Street-level crime data from Police
    • Places of interest data from Open Street Map
      • 52 types of places, including shops, parking spaces, etc.

Exploratory Data Analysis

Overall Frequency

There was a total of 104,322 crimes between April 2021 and March 2024 in Barnet.
In this period, anti-social behaviour was the most common type of crime, followed by violent crime.

Exploratory Data Analysis

Fequency in the Last 12 Months

However, in the last 12 months, violent crime has become the most common type of crime, which raises concerns.

Global Spatial Autocorrelation

Spatial autocorrelation measures the extent to which values in a spatial dataset are similar or dissimilar to their neighbours. Spatial correlation can be positive (clustering), negative (dispersion) or zero (random).

After creating distance-based weights, global spatial autocorrelation test was conducted for each type of crime.

Only anti-social behaviour showed a significant spatial autocorrelation (p-value < 0.001) with a Moran’s I statistic of 1.2E-03, which was a weak positive autocorrelation.

Type

P Value

Moran's I

Anti-Social Behaviour

0.000e+00

1.2e-03

Violent Crime

6.545e-01

-2.0e-04

Other

5.789e-01

-3.0e-04

Vehicle Crime

3.330e-01

2.0e-04

Theft from the Person

1.271e-01

2.3e-03

Burglary

1.349e-01

1.9e-03

Changing the distance threshold did not change the finding.

Local Spatial Autocorrelation for ASB

As anti-social behaviour (ASB) only demonstrated spatial autocorrelation, the points that precisely show autocorrelation will be identified.

Most points were not statistically significant. However, some cold spots were detected in Burnt Oak and East Finchely.

Quadrant

Count

High-High

0

Low-Low

1,534

High-Low

4

Low-High

0

Not Significant

24,013

Kriging Model

Kriging is a method of spatial interpolation, which models the spatial relationship between points and hence gives less weight to points that are farther away from each other. A kriging model creates a prediction surface based on location coordinates and other predictors.

The multivariate kriging model gave a predicted count of ASB ranging from 1.3 to 24.4.

The highest prediction was estimated to be in North Finchley, close to a many shops on high street. A couple more hot spots were identified in Golders Green and Colindale.

Kriging Model Evaluation

Overall, kriging model had an root mean square error (RMSE) of 12.1.

The error ranged from -20.5 to 259.5 and had a median of -0.1 and a mean of 3.3. Assuming that the absolute error lower than 10 is a moderate estimation, the model fairly predicts the number of ASB.

However, the model wasn’t able to capture high ASB crime spots. Greatest error was spotted along North Circular Road in East Finchley near St Pancras and Islington Cemetery. While the model predicted about 3.5 ASB crimes based on places around, there were 163 ASB crimes over three years.

Kriging Model Evaluation

Test and Train Error

Before building a kriging model, principal component analysis (PCA) was performed. Rather than building kriging model with 52 predictors, seven principal components, which contained about 70% of information of the predictors, were used to make predictions.

Train error is negligible as its value is extremely small.

On the other hand, test error shows some variation over different number of principal components.

Keeping 4 or 6 components appears to be a reasonable choice as RMSE is fairly low with 4 or 6 components.

For easier interpretation, 4 was chosen as the number of components for PCA model.

Principal Component Analysis

Principal Component Analysis (PCA) is a linear dimensionality reduction method. The data is linearly transformed onto a new coordinate system in a way that it identifies principal components capturing the largest variation in the data.

Dimension 1 to 7 explained about 70% of the total variance.

Principal Component Analysis

Dimension Top 10 Contributors Main Theme
1 Distance to nearest: car repair shop, electronics shop, money exchange and transfer, garages, vet, houseware shop, gas station, liquor shop, and launderette Urban outskirt
2 Distance to nearest: bakery, bank, clothes, lawyer’s office, bridge, real estate agent, post-secondary institution (e.g., college or university), ATM machines, post office, and convenience store High streets
3 Distance to nearest: bar, clinic, doctor’s office, houseware shop, aesthetics shop (beauty), lawyer’s office, post-secondary institution, hospital, post office, and vet Healthcare setting
4 Distance to nearest: post depot, garage, grave yard, post office, warehouse, car wash, car dealer, aesthetics, and clothing shop Urban outskirt

Random Forest Model

Among linear regression, support vector machine and random forest models in a five-fold cross validation set, random forest model had lowest error measured by RMSE.

The 10 most important features in the random forest model were vicinity to: clothing shop, bicycle parking, houseware shop, parking lot, bank, pharmacy, convenience, fuel, restaurant, and community centre. These are a mix of features from high streets and urban outskirts.

Random Forest Model

Predicted count of ASB by random forest model is overall similar to that of kriging model, however, with a slightly wider range.

Highest prediction of around 40 ASB cases was estimated in Chipping Barnet.

Random Forest Model Evaluation

Similar to kriging model, the model was not able to predict the points with high number of ASB points.

Random Forest Outlier Analysis

Values above 50

Highest error was observed in North Cricklewood around Pennine Mansions. We can see that the point is located within blocks of flats and is in the vicinity of some shops and bus stations on the street.

Summary

  • Anti-social behaviour (N = 25,551) was the most prevalent type of crime in Barnet between April 2021 and March 2024. However, in the last 12 months between April 2023 and March 2024, violent crime (N = 7,217) was the most common type of crime, followed by anti-social behaviour (N = 6,912), which raises concerns.

  • Only anti-social behaviour of all crime types demonstrated a statistically significant spatial autocorrelation. That is, areas nearby locations where anti-social behaviour took place are also likely to have anti-social behaviour. Changing the threshold of distance for classifying neighbouring points did not change the finding.

  • Distances to each nearest place of interest were summarised into seven components by PCA. Evaluation of multivariate kriging model later showed that four components are sufficient for the model to perform. The four components primarily captured urban outskirts, high streets and healthcare settings.

  • Kriging model predicted the number of anti-social behavoiur, ranging from 1.3 to 25.5. When the prediction was compared with the actual count of anti-social behaviour, it fairly captured the locations where anti-social behaviour did not happen too frequently. Nonetheless, the model poorly captured hot spots. This could be due to high resident or transient population density in the area.

  • Amongst linear regression, support vector machine and random forest models, random forest performed best. Distance to nearest clothes shop, bicycle parking spaces, houseware stores, car parking lots, bank, pharmacy, convenience store, gas station, restaurant, and community centre were the ten most important features in the random forest model. However, similar to kriging model, the random forest model was not able to capture hot spots of anti-social behaviour. Given that the locations with highest error observed were in either blocks of flats or near tube station, it is also likely that the model’s underestimation arises from the lack of adjustment of crime count by population density.

Limitation & Future Project Direction

  1. Lack of adjustment for population density

    • Adjustment by population traffic with footfall data

    • Subset crime points around tube stations and adjust traffic volume with tap in and out data by TfL

  2. Weak spatial autocorrelation

    • Subset different time range or different areas
  3. Data Quality

    • Crime data

      • Biased patterns in patrol, leading to more records of crime near police station

      • Retroactive update of crime count

        • Limiting the data import to the second or third latest month
    • Places data

      • May not be totally inclusive of terrestrial information

        • Scoping places with safety neighbourhood team

Strength & Lessons Learned

  1. Kriging model was able to make prediction over location where no anti-social behaviour was not observed in the period of investigation.

  2. Both kriging and random forest models were able to identify cold spots of anti-social behaviour within the London Borough of Barnet. This may be of use to Metropolitan Police to optimise resource allocation.

  3. Lesson learned include:

    • Importance of version control & utilising saveRDS()

    • Importance of evaluation to test model performance

End

📝Github Repository: https://github.com/hamchang95/arc_lbb

📍Project Website: https://hamchang95.github.io/arc_lbb_website/

📩Contact: hannah.chang@barnet.gov.uk